Sustainable growth in complex networks 
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Based on the empirical analysis of the dependency network in 18 Java projects, we develop a 
novel model of network growth which considers both: an attachment mechanism and the addition 
of new nodes with a heterogeneous distribution of their initial degree, fco. Empirically we find 
that the cumulative degree distributions of initial degrees and of the final network, follow power- 
law behaviors: P(ko) oc k^ a , and P{k) oc fc 1-7 , respectively. For the total number of links as 
a function of the network size, we find empirically K(N) oc N 13 , where /3 is (at the beginning of 
the network evolution) between 1.25 and 2, while converging to ~ 1 for large N. This indicates 
a transition from a growth regime with increasing network density towards a sustainable regime, 
which prevents a collapse because of ever increasing dependencies. Our theoretical framework is able 
to predict relations between the exponents a, /3, 7, which also link issues of software engineering 
and developer activity. These relations are verified by means of computer simulations and empirical 
investigations. They indicate that the growth of real Open Source Software networks occurs on the 
edge between two regimes, which are either dominated by the initial degree distribution of added 
nodes, or by the preferential attachment mechanism. Hence, the heterogeneous degree distribution 
of newly added nodes, found empirically, is essential to describe the laws of sustainable growth in 
networks. 

PACS numbers: 89. 75. He, 05.40.-a, 89.20.Ff 



How do real networks grow? Nodes and links are not 
always added at a constant rate. Instead, their num- 
bers could be drawn from a broad distribution, spanning 
almost the size of the network. This inhomogeneity con- 
siderably impacts the network growth, but it was not 
covered in existing analytical approaches. Hence, this 
problem is addressed in this Letter. We provide a novel 
model of network growth which is solved analytically and 
verified empirically by studying the evolution of several 
Open Source Software projects. 

During its evolution, a network can have a non-linear 
growth of its set of nodes, or edges. However, many mod- 
eling approaches, most notably the preferential attach- 
ment, simply assume that (i) at any time step a constant 
number of nodes is added to the network, that (ii) each 
new node is linked to the network with a constant number 
of links, and that (iii) neither nodes or links arc deleted 
[lH3j. If such assumptions hold, this would result in a 
growth N(t) oc t v of the total number of nodes in the 
network, and K (r) oc t a of the total number of links, 
where both r\ ~ 1 and A ~ 1. Such a network growth 
could be called sustainable, in contrast to the two limit- 
ing cases of (a) accelerated growth [4|46[ , if A/77 > 1 , or of 
(b) saturated growth, if A/?/ < 1. Both of these growth 
processes are not sustainable in the long run as they ei- 
ther lead to collapse or to stagnation [7j. But there is, 
at least for the intermediate observable time scales, also 
empirical evidence of networks growing with increasing 
link density, K/N for example the World Wide Web [g]. 

However, results obtained for N{t) or K (r) refer to 
macroscopic properties, which are compatible with a 
large variety of 'microscopic' assumptions about node 
and link additions (or deletions). More importantly, the 



kinetic exponents may change over time and may reach 1 
only asymptotically, which would point to changes in the 
growth mechanism on intermediate time scales. Eventu- 
ally, in addition to the total number of nodes and links, 
there are other characteristics of the network structure 
and dynamics which need to be predicted and to be ver- 
ified empirically. In this Letter, we address these prob- 
lems both theoretically and empirically by (i) developing 
a detailed model of network growth which includes the 
heterogeneous degree distribution of newly added nodes 
(instead of adding nodes with the same degree), and (ii) 
by verifying the predictions of our general model against 
a novel data set of growing networks. 

We start by describing the empirical findings, to moti- 
vate the new assumptions of our network growth model, 
later. We have used a dataset of 18 Open Source Software 
(OSS) projects (see Table 1), which are programmed in 
Java. The complex network consists of nodes, which 
are Java classes (each file corresponds to one class), and 
directed links representing dependencies between these 
classes. For example, one class can call a function de- 
fined in another class, or extend a functionality of another 
class. During software evolution, new classes are added 
to the project and are linked to existing classes based on 
principles defined in software engineering. So, if we are 
able to reveal universal dynamics underlying such growth 
processes, this is a remarkable result on its own. For the 
time dependent evolution of the OSS projects, we can 
rely on version control systems which record all changes 
made. For our analysis, we have used snapshots of inter- 
vals of 30 days, for a project life span between 2.7 and 8.2 
years - which goes much beyond the few snapshots avail- 
able for previous investigations of OSS growth [9|, llOj . 
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FIG. 1. (Color on-line), (left) Final complementary accumulated degree distribution P(k) oc fc 1-7 , (right) initial complementary 
cumulated degree distribution P(ko) oc fc 1_c * , (center) total number of links K(N) oc N^ as a function of network size N. Colors 
indicate four different OSS projects: Architecturware (black circles), Eclipse (blue squares), JEdit (violet stars), Sapia (magenta 
triangles). See Table U for more details. The small symbols, represent the complete empirical datasets, while the large ones the 
binned data. The inset in the central panel shows the evolution of for the 18 projects, during the growth process. In this 
inset, the different symbols represent the evolution of /3 for the different projects, while the dashed line represents the median 
of j3 for the complete set. 



Nevertheless, we may use these studies as a point of ref- 
erence, as they also study some topological properties, 
such as the cumulative degree distribution P(k). 

In order to derive analytical results about the latter, 
let us define n(k, r) as the degree distribution, i.e. the 
number of nodes with total degree k at time r. Obvi- 
ously K(t) = X}fc=i kn(k,r). The complementary cu- 
mulative degree distribution at time r is then given by 
P(k,r) = 1 — ^ l<k n{l,T)/N{T). We can remove the 
real time r by using the scaling r oc N 1 ' 11 , which means 
K(N) oc N&, where j3 = A/77. This procedure implies 
that the number of nodes increasing, i.e. the deletion of 
nodes is not considered in the empirical analysis. Figure 
[T] illustrates the empirical results for these quantities by 
showing four OSS projects of very different size. 

Looking at the final complementary cumulative degree 
distribution P(k), obtained for the maximum N of the 
project, we clearly identify a power-law P(k) oc fc 1-7 
(left panel of Fig. [1]), which is equivalent to a degree 
distribution n(k) oc A; -7 The exponents 7 which charac- 
terize the structure of the final product are given in Table 
HJ Dependent on the size of the project, we find values 
between 2 and 3, with a clear tendency towards values 
closer to 3. For the growth of the OSS projects (center 
panel of Fig. [1]), we obtain slightly bend curves for the 
six projects, which indicate that the exponents /3 changed 
over time (shown in the inset of the center panel) . For ev- 
ery project, the total degree as a function of system size 
was split into different windows (of size 500) and an esti- 
mation of the exponent /3 was performed for each window. 
Starting at values between 1.25 and 2, they converge to 
smaller values of about 1, i.e. we observe a transition 
from accelerated to sustainable growth. The final values 
of f3 are given in Table fl] Note that (3 is a measure of the 
activities of the developers, i.e. it characterizes a social 



process. The right panel of Fig. Q] eventually presents the 
most interesting empirical finding that, different from the 
above mentioned assumptions about preferential attach- 
ment and most modeling approaches, newly added nodes 
have a very heterogeneous initial degree fco- In fact we 
observe a power-law for the complementary cumulative 
initial degree distribution P(fco) ex k ~ a , where a is re- 
lated to the initial conditions of the software growth, i.e. 
to software design. The values found are presented in 
Tab. UJ It remains to reveal the inherent relations be- 
tween the three exponents a, ft, 7 which is done by the 
following analytical approach. 

We assume that nodes are added to the project at a 
constant rate, i.e. time t is given by the total number of 
nodes, t = N, or conversely, r\ — 1. For the dynamics 
of the degree distribution we postulate the following rate 
equation: 

n(k, t) = 4,/t (t) + n(k - 1, t) u[k - 1 -> k] 

+n(k + l,t)u[k + l->k] (1) 

-n(k,t) {co[k -)■ k- 1] +co[k -)■ fc + 1]} 

This is a first order approximation of the dynamics 
based on the addition/deletion of one node at a time. 
The term 5k,k (t) m Eq. © describes the addition of 
a new node with an initial degree exactly equal to k. 
In accordance with our empirical findings, this degree 
is randomly drawn from a truncated power-law distri- 
bution g(ko) with exponent a; i.e. Prob[ko(t) — k] — 
min ((a — l)/fc Q , t — 1). For the transition rate of growth 
processes, k — > k + 1, we assume 



[k ->■ k + 1] = 



fco (t) 
K(t) 



£)>*■■ 



(2) 



This rate is proportional to k, i.e. it is based on preferen- 
tial attachment. Without that assumption, the process 



Project 


N 


a 


P 


7 


eclipse 


28898 


2.7(1) 


1.06(4) 


2.6(1) 


springframework 


7707 


3.5(1) 


1.02(4) 


2.9(1) 


fudaa 


7610 


2.7(1) 


1.1(1) 


2.7(1) 


jpox 


7259 


2.49(8) 


1.08(2) 


2.44(8) 


architect urware 


7110 


2.7(1) 


1.00(3) 


2.8(1) 


jena 


6619 


3.5(1) 


0.99(3) 


2.9(1) 


hibernate 


5938 


2.5(2) 


1.03(3) 


2.5(1) 


sapia 


4129 


3.44(8) 


1.00(2) 


3.0(1) 


rodin-b-sharp 


4077 


2.8(1) 


1.03(2) 


2.6(1) 


azureus 


4051 


2.9(2) 


1.14(5) 


2.6(2) 


jedit 


3997 


2.9(1) 


1.01(1) 


2.93(8) 


Jaffa 


3854 


3.0(3) 


1.1(1) 


2.7(3) 


jmlspecs 


3590 


2.4(2) 


0.97(6) 


2.6(2) 


openxava 


3000 


3.2(2) 


1.04(4) 


2.9(2) 


phpeclipse 


2881 


2.8(1) 


1.02(2) 


2.73(8) 


personalaccess 


2687 


3.1(1) 


0.95(6) 


2.9(1) 


xmsf 


2576 


2.2(1) 


1.08(3) 


2.3(1) 


aspectj 


1856 


2.5(1) 


1.03(4) 


2.5(1) 



TABLE I. Empirical results obtained for 18 Open Source Java 
projects. N gives the maximum number of nodes (classes) at 
the date of the last snapshot taken; a, 7 are the exponents 
for the initial and final degree distribution. /? is the value of 
the exponent describing the asymptotic growth of the total 
number of links as a function of network size. 



would result in a single-scale network which is not in 
accordance with the empirical studies above. The pref- 
erential attachment can occur by means of two different 
processes: The first one occurs if a newly added node 
with k (t) new links to existing nodes, which are selected 
with a probability proportional to their relative degree 
k/K. The second process describes the addition of links 
between existing nodes, where a and r are constants de- 
scribed below. The transition rate corresponding to the 
deletion of links, k — > k — 1, is also assumed to be pro- 
portional to the degree of the node, 



)[k->k-l] = {a- r -\k. 



(3) 



Let us now elucidate the emerging dynamics by split- 
ting it into two different limiting cases. The first one 
occurs when the growth of the network based on the addi- 
tion of nodes with heterogeneous degree kg , does not play 
any role. I.e. k can be set to zero, for every time step. 
In this case the dynamics is only governed by the addi- 
tion/deletion of links distributed between existing nodes, 
which follows the preferential attachment /deletion rule. 
Then, the rate equation ((5J, in the continuous limit and 
for large N, can be easily translated into the following 
Fokker-Planck equation: 



dtn(k, t) = rk n(k, t) 



fj 2 

u kk 



] k 2 ) n(k,t) (4) 



which is equivalent to the following Langevin dynamics 
for the degree k% of a single node i: 
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FIG. 2. (upper panel) Exponents 7 of the power-law degree 
distribution of the final network, (lower panel) exponents j3 
for the growth of the total number of links as a function of 
the exponent a of the initial degree distribution. The different 
thin lines correspond to simulations of the model described, 
for various network sizes (N — 2 x 10 3 -dotted line- to TV = 
10 -dashed line-). The thick lines indicate the analytical 
results of Eqs. © and (|12[) . Marks with error bars correspond 
to the empirical results for the 18 projects, given in Table [I] 



This describes the known law of proportional growth 



ki(t) =-rki(t) + aki{t)Zi(t), 



(5) 



111 Il2| . where r is the mean growth (drift) and a is 
the variance of the normalized random force £, (t) . It is 
well known [13j that such processes in the long run lead 
to a power-law distribution n(k) ex fc~ 7 , i.e. to Zipf's 
law for the cumulative degree distribution P(k) oc A: 1-7 , 
with 7 equal to 2. In fact, Zipf's law was empirically con- 
firmed for the in-degree distribution of Linux packages ;&] 
as well as for Java projects [lCJ. However, the out-degree 
distribution, at least for the latter dataset, clearly fol- 
lows a log-normal distribution. After all, because this 
limit case only considers the growth of the number of 
links, but not of the number of nodes, it only provides 
a limited understanding of real software dynamics (and 
random addition/deletion processes are only one of many 
different ways to obtain Zipf's law). 

Therefore, in this Letter, we arc more interested in the 
second limiting case which ignores the addition/deletion 
of links among existing nodes -i.e. a, r are negligible-, 
while emphasizing the network growth based on the ad- 
dition of nodes with a broad initial degree distribution, 
g(ko). This assumption is fully justified in the case of 
broad distributions of initial degrees, as found empiri- 
cally. This dynamics is fully described by the following 



set of equations: 

n(l,t) = S 1Mit) -'^n(l,t) (6) 

h(k,t) = d k>ko(t) + ^-{(k-l)n(k-l,t)-kn(k,t)} 

and the initial condition n{k, 0) = uq Sk. no -i- I- e - m i~ 
tially a small number of nodes (e.g. no = 2) with a de- 
gree of no — 1 is assumed, which describes a small, fully 
connected network to start with. From this set of equa- 
tions, we first derive the dynamics for the total number 
of links, K(t). By definition, for a single network real- 
ization K (t) — ko(t) holds. The ensemble average (K (t)) 
over many realizations of the network growth process is 
then given by: 



K{t)) = (k \k Q < t) + t ■ Pi-ob[ko(t) >t}. (7) 



as. 



The first term represents the expected value of fco(i) re- 
stricted to ko(t) < t, which applies if the number drawn 
from the distribution g(ka) is lower than the current net- 
work size (t = N) and, thus, the newly added node is 
able to establish as many links as drawn from the distri- 
bution. If this is not the case, i.e. ko(t) > t the node can 
only create at most t — 1 links, which is described by the 
second term. By recasting the power-law distribution for 
.g(fco), we get after some calculation: 



K{t) 



T> OC 

/ dkg(k)k + t I 



dk g(k) 



a 



(8) 
Asymptotically, we find that the total number of links 
grows in time or with network size t = N, respectively, 
as a power-law, K{t) oc t&, with the exponent 



P = 



3 — a if a < 2 



f 



if a > 2 



(9) 



By applying the ensemble average to Eqs. ([5]), we 
are further able to find a mean-field approximation for 
the dynamics of the degree distribution n(k,t). Using 
(<5fc,fc (t)) — Prob[fc = ko(t)] = (a — l)/k a and similar 
arguments as in Eqs. ([8][9]), we find that 



(ko(t))=t 2 



1 



1 



a 



(10) 



By analyzing the solution of Eqs. (|5|)- (fTU)) we find 
two different regimes for the ratio (ko(t))/(K(t)): (i) if 
a > 2, then (k Q (t)) oc (a - l)/(a - 2) and (K(t)) oc 
(q - 1)/(q - 2), (ii) if a < 2, (fc (t)) oc t 2 - a /(a - 2) 
and (K(t)) oc t 3 ~ a . Both regimes, however, yield iden- 
tical result, i.e. (k (t))/(K(t)) — ((a)t, with ((a) being 
a normalization constant. Thus, we can rewrite Eqs. ([5]) 



(n(l,t)) = (a- 

(a 



MM)) 



(h(k,t)) 



k a 



C(a)t 

(k- l){n(k- l,t)} -k(n(k,t)) 



(ii) 



These equations reveal a competition between two dif- 
ferent processes: the growth of the network caused by 
the addition of links with a broad initial degree distri- 
bution (first term) and the growth of a node's degree 
caused by a mechanism akin to preferential attachment 
(second term). If a is small, the first case dominates 
and the expected degree distribution is simply given by 
(h(k,t)) = t(a — l)/k a . On the other hand, if a is large 
and the addition of new nodes with a heterogeneous ini- 
tial degree distribution can be neglected, we recover the 
usual Barabasi-Albert model with n{k,t) oc fc~ 3 . Thus, 
we have found two different regimes for the final degree 
distribution, which depend of the exponent of the initial 
degrees distribution a: 



7 



a if a < 3 
3 if a > 3 



(12) 



To conclude, our analytical approach has provided a firm 
relation between the three different exponents a, j3, 7, 
which can be tested in two different ways: (a) by com- 
puter simulations of the full dynamics for various network 
sizes N and initial conditions (a), (b) by comparison with 
the empirical findings from the 18 OSS projects. The 
results are shown in Figure O They confirm that the 
analytical approximations are indeed valid and in good 
agreement both with the computer simulations and the 
empirical results. Most interestingly, they reveal that 
the growth dynamics of real OSS networks is on the edge 
between two regimes: for a < 3, the initial degree distri- 
bution and hence the addition of new nodes would dom- 
inate the whole growth process, whereas for a > 3 the 
preferential attachment of links between existing nodes 
would dominate. As the empirical findings verify, none 
of these regimes fully cover real software growth. In par- 
ticular, the heterogeneous degree distribution of newly 
added nodes cannot be neglected. 

Eventually, we wish to point to the self-organizing dy- 
namics observed in OSS, which turns an initially accel- 
erated network growth (/3 > 1) into a sustainable one 
(0 —> 1) found for mature projects. This prevents a col- 
lapse of the software growth due to non-linearly increas- 
ing dependencies between classes. Interestingly, /3, which 
describes the effort (social activity) of developers adding 
new classes to the software, is found to be closely related 
to the other two exponents a, 7, describing a very dif- 
ferent 'dimension' of the software evolution, namely soft- 
ware engineering. This may shed new light on the under- 
lying principles of software design and software project 
management. 
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